A Scalable System for Embedded Large Vocabulary Continuous Speech Recognition
نویسندگان
چکیده
This paper presents a system for large vocabulary continuous speech recognition in condition of constrained hardware resources. We investigate efficient pruning and caching strategy aiming to handle extensive acoustic and linguistic modeling. Software components are analyzed in terms of resource consuming. Then, we evaluate the system performance in extreme configuration where acoustic and linguistic models are dramatically pruned. Results show that the system design we proposed allows to use large HMM-based acoustic models and trigram language models while performing very fast decoding, under 0.6 real-time on a standard desktop computer while remaining the transcript relevance.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملParallelized viterbi processor for 5, 000-word large-vocabulary real-time continuous speech recognition FPGA system
We propose a novel Viterbi processor for the large vocabulary real-time continuous speech recognition. This processor is built with multi Viterbi cores. Since each core can independently compute, these cores reduce the cycle times very efficiently. To verify the effect of utilizing multi cores, we implement a dual-core Viterbi processor in an FPGA and achieve 49% cycle-time reduction, compared ...
متن کاملProfiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis
When deployed in embedded systems, speech recognizers are necessarily reduced from large-vocabulary continuous speech recognizers (LVCSR) found on desktops or servers to fit the limited hardware. However, embedded hardware continues to evolve in capability; today’s smartphones are vastly more powerful than their recent ancestors. This begets a new question: which hardware features not currently...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007